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Spatially explicit data layers of tree species assemblages, referred 
to as forest types or forest type groups, are a key component in 
large-scale assessments of forest sustainability, biodiversity, timber 
biomass, carbon sinks and forest health monitoring. This paper ex- 
plores the utility of coupling georeferenced national forest inventory 
(NFI) data with readily available and spatially complete environ- 
mental predictor variables through spatially- varying multinomial lo- 
gistic regression models to predict forest type groups across large 
forested landscapes. These models exploit underlying spatial associ- 
ations within the NFI plot array and the spatially-varying impact 
of predictor variables to improve the accuracy of forest type group 
predictions. The richness of these models incurs onerous computa- 
tional burdens and we discuss dimension reducing spatial processes 
that retain the richness in modeling. We illustrate using NFI data 
from Michigan, USA, where we provide a comprehensive analysis of 
this large study area and demonstrate improved prediction with as- 
sociated measures of uncertainty. 

1. Introduction. Forest type is a classification of forestland based on 
the plurality of species of all live trees that contribute to stocking. Stocking, 
in turn, is a measure of actual forest stand density relative to the density 
considered optimal for a desired purpose, such as site occupancy or volume 
growth [Stage (1969)]. A forest type group (FTG) is an assemblage of forest 
cover types that share closely associated forest species or site requirements. 

Forest area by tree species composition classes, such as forest types and 
FTGs, has received increased attention in recent years as an indicator of 
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forest sustainability and biodiversity. The Ministerial Conference on the 
Protection of Forests in Europe (MCPFE 2008: http://www.mcpfe.org) 
includes area by forest type as an indicator for a criterion related to main- 
taining forest resources, and the Montreal Process (Montreal Process 2005: 
www.rinya.maff.go.jp/mpci) includes the same indicator for criterion related 
to maintaining ecosystem biodiversity and forest productivity. Action E43 
(Harmonization of the national forest inventories of Europe) (COST E43 
2007: www.metla.fi/eu/cost/e43) of the European program of Cooperation 
in the field of Scientific and Technical Research has selected forest type as a 
core variable for biodiversity assessments. Commercial enterprises in North 
American, Mediterranean, central European and Nordic countries also rely 
on spatially explicit regional and national forest assessments by type to sup- 
port decisions regarding establishment or expansion of facilities (e.g., paper 
mills) and for long-term forecasts of wood fiber supplies for burgeoning en- 
ergy bioeconomies. 

National forest inventories (NFI) conducted in North America, Europe 
and elsewhere are the most important sources of comprehensive information 
for assessing FTGs for large geographic domains. Because complete enumer- 
ative inventories are prohibitively expensive, NFIs sample populations of in- 
terest and report plot-based estimates of forest resources. For valid sampling 
designs and corresponding estimators, these plot-based approaches produce 
asymptotically unbiased estimates of area by forest types and FTGs. How- 
ever, these approaches are unable to depict spatial distributions of forest at- 
tributes and do not easily incorporate ancillary variables or complex spatial 
dependence structures to improve the accuracy and precision of parameter 
estimates and/or prediction. Therefore, model-based approaches to map- 
ping are attracting greater interest. Natural resource mapping initiatives 
typically entail constructing statistical models of relationships between land 
cover attributes and variables including soil, climatic, topographic and satel- 
lite image spectral variables. These models are then used to produce digital 
data layers of small-area spatially explicit prediction across large domains, 
which ultimately support the end-user analyses described above. 

Spatial process models [e.g., Cressie (1993); Stein (1999)] to analyze NFI 
data have, hitherto, focused largely upon continuous outcomes such as pre- 
diction of biomass [e.g., Finley et al. (2008a)]. A classic paper by Diggle, 
Tawn and Moyeed (1998) discussed the use of spatial process models for 
non-Gaussian data within the framework of generalized linear models. Hea- 
gerty and Lele (1998) considered a composite likelihood approach to binary 
spatial regression. Recently, Finley, Banerjee and McRoberts (2008b) ex- 
plored a spatial logistic regression model to predict forested areas. Our data 
here concerns categorical (seven FTG's) rather than binary outcomes. Fur- 
thermore, realizations of a given FTG will exhibit different composition and 
proportion of species. For instance, two oak dominated plots assigned to the 
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oak/hickory FTG might have different water or temperature requirements 
due to the type of oak species present and proximity along environmental 
gradients (e.g., springtime water availability or minimum annual tempera- 
ture). We, therefore, find it attractive to allow the regression coefficients to 
vary by location, envisioning a spatial surface associated with each coeffi- 
cient. For instance, we could model the spatial surface for the coefficient 
parametrically — using perhaps a polynomial surface function. Such speci- 
fications are often too arbitrary and may lead to a range of surfaces too 
inflexible for our purposes. In related work, Fahrmeir and Lang (2001) and 
Kneib and Fahrmeir (2006) consider semiparametric regression with splines 
and Markov random fields to model spatial effects. Modeling multiple regres- 
sion coefficient processes jointly, however, would require multivariate speci- 
fications of splines that may be awkward. Instead, we treat each regression 
coefficient surface as a realization from a continuous spatial process. A mul- 
tivariate spatial process is arguably more natural and at least as flexible 
here. 

We use spatially-varying multinomial logistic regression models to ex- 
ploit, more fully, the spatial proximity of the NFI plot array and the poten- 
tially spatially-varying impact of the predictor variables on the response to 
improve the accuracy and precision of FTG prediction at locations where 
we have observed predictors but not inventory plots. We also encounter a 
large number of locations that make full inference computationally oner- 
ous. Modeling large spatial datasets has received much recent attention (see 
Section 4) , but many of the existing approaches are unsuitable for our coef- 
ficient processes. We discuss how a low-rank spatial process can be adapted 
to model the regression coefficients and achieve computational feasibility. 

While most of the models we formulate can possibly be estimated using 
maximum likelihood or variants thereof, we adopt a Bayesian approach [e.g., 
Gelfand et al. (2003)]. This is attractive, as it offers exact inference for the 
random spatial coefficients, and that too with non-Gaussian data, by de- 
livering an entire posterior distribution at both observed and unobserved 
locations. Spatial interpolation for processes that are neither observed nor 
arise as residuals appears inaccessible with classical likelihood-based meth- 
ods. On the other hand, Bayesian model fitting involves rather specialized 
Markov chain Monte Carlo (MCMC) methods [see, e.g., Robert and Casella 
(2005)] that raise concerns about computational expense and reproducibility 
of inference. These concerns have, however, started to wane with the deliv- 
ery of relatively simpler R packages (www.r-project.org), including mcmc, 
MCMCpack, geoRglm and spBayes, that help automate such methods and 
diagnose convergence. 

While our primary contribution here lies in the novel application, we also 
offer several methodological advancements. As mentioned earlier, we extend 
existing spatial logistic models to spatially-varying multinomial regression 
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models. This involves multivariate spatial processes (one for each regres- 
sion coefficient) that we wish to model jointly. This approach is similar to 
Gelfand et al. (2003), but, unlike there, we allow each coefficient process 
to have its own spatial correlation structure. We achieve this using linear 
transformations of independent processes. This idea has been used elsewhere 
[e.g., Finley et al. (2008a)] to model multivariate continuous outcomes, but 
not completely unobserved coefficient processes as we attempt here. 

The remainder of the article proceeds as follows. Section 2 provides an 
overview of the NFI data and study area used to illustrate our proposed 
methods. These descriptions are followed by a brief preliminary analysis, 
the results of which help motivate the models and methods presented in 
Sections 3 and 4. The full analysis of these NFI data using the proposed 
models is detailed in Section 5. In Section 6 we present and discuss the 
results of this analysis. Finally, Section 7 concludes the paper with a brief 
summary and description of future direction and work. 

2. Data. The Forest Inventory and Analysis (FIA) program of the U.S. 
Forest Service conducts the NFI of the USA. The program has established 
field plot centers in permanent locations using a sampling design that pro- 
duces an equal probability sample [Bechtold and Patterson (2005)]. The 
sampling design is based on a tessellation of the USA into approximately 
2400 ha hexagons and features a permanent plot at a randomly selected lo- 
cation within each hexagon. The State of Michigan in which the study area 
is located provided additional funding to triple the sampling intensity to 
approximately one plot per 800 ha. Each plot consists of four 7.32 m radius 
circular subplots for a total area of 672 m 2 . The subplots are configured as a 
central subplot and three peripheral subplots with centers located at 36.58 
m and azimuths of 0°, 120° and 240° from the center of the central subplot. 
In general, locations of forested or previously forested plots are determined 
using global positioning system receivers, whereas locations of nonforested 
plots are verified using aerial imagery and digitization methods. 

Field crews observe species and measure diameter at-breast-height (dbh) 
(1.37 m) and height for all trees with dbh of 12.7 cm or greater and assign 
forested portions of subplots to forest types based on visual assessments. For- 
est types are also assigned to forest portions of subplots using algorithms 
based on measures of density and stocking calculated from species observa- 
tions and diameter and height measurements of all plot trees. To mitigate 
the effects of plot location errors and eliminate the difficulties associated 
with plots to which multiple FTGs were assigned, we use only the central 
subplot to which a single FTG had been assigned. 

2.1. Study area. The study area is the 79,094.17 km 2 forested land of 
Michigan. Due to proximity to the Great Lakes and numerous episodes of 
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glaciation, this region exhibits a host of climate and soil characteristics that 
have produced diverse forest communities. Precipitation and temperature 
extremes are well-known factors in determining spatial patterns of forest 
species composition in the Great Lakes states [Albert (1995)]. Therefore, 
the predictor variables we consider include a set of long-term climate data 
and a soil drainage index. We obtained raster data layers of mean annual 
precipitation (PRECIP), temperature minimum (TMIN) and temperature 
maximum (TMAX) over the period 1971-2000, and average annual snowfall 
(SNOW) over the period of 1961-1990. These data were generated by the 
PRISM climate mapping project [Daly et al. (2000)]. Recently, Henne, Hu 
and Cleland (2007) showed that from a suite of long-term monthly climate 
and soil composition variables, lake-effect snowfall abundance contributes 
the most to explaining spatial variation in mesic tree species (i.e., species 
such as sugar maple or beech that are found on sites with moderate soil mois- 
ture) within the Lake State region. These authors also suggest that spring 
lake-effect snow provides moisture to course-textured xeric soil, allowing 
mesic forest types to become established on these otherwise droughty soils. 
The long-term climate patterns and soil characteristics affect tree species 
survival and influence the assemblage of species found on a given site [Host 
et al. (1988)]. 

The soil drainage index (DI) raster data layer was formulated to mimic the 
quantity of water present in a soil and is available to plants under normal, 
long-term climatic conditions, including water under saturated and unsat- 
urated conditions [Schaetzl (1986)]. The DI variable ranges from for the 
driest soils to 99 for open water. The PRISM data layers were originally 
generated at ~0.8x0.8 km pixel resolution, but were resampled to match 
the 30x30 m resolution of the DI data layer. We make this simplifying as- 
sumption because the disparity between the data layer resolution is small 
compared to the distance between FIA plot observations. Finally, all predic- 
tor data layers were reprojected to share the projection of the georeferenced 
forest inventory plots (Figure 1). Here we see strong spatial dependence 
among the predictor variables, for instance, north to south gradients in pre- 
cipitation and temperature extremes and gradients in mean temperature 
minimum and mean snow depth that are perpendicular to the shorelines. 

Figure 2(a) illustrates the georeferenced forest inventory data consist- 
ing of 5180 forested FIA plots measured between 1999 and 2006 that meet 
our inclusion criterion. For this analysis, the FTGs of interest and associ- 
ated relative percent observed across the inventory plots are white/red/jack 
pine (10.89%), spruce/fir (14.56%), oak/pine (2.86%), oak/hickory (12.92%), 
elm/ash/cottonwood (6.60)%, aspen/birch (16.99%), and maple/beech/birch 
(35.19%). The left column in Figures 3 and 4 offers interpolated surfaces of 
the FTGs. Note that interpolation is over binary values that indicate the 
presence and absence of the given FTG. The surfaces show strong spatial 
patterns in FTG range and extent. 
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(a) Mean annual precipitation (b) Mean annual temperature max 




o ■ 

(c) Mean annual temperature min 




(d) Soil drainage index (e) Mean annual snow depth 



Fig. 1. Surfaces of the mean zero and unit variance standardized predictor variables 
resampled to a 30x 30 meter resolution across the domain. Lighter colors correspond to 
higher values. 

2.2. Preliminary analyses. We index the georeferenced FIA plots as S = 
{si,...,s n }, where s is a coordinate vector (e.g., longitude and latitude), 
and use a binary outcome Y(si) = 1 or to indicate the presence or ab- 
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(c) Predictive process knots 

Fig. 2. Forest inventory plot locations, plots locations belonging to the maple/beech/birch 
FTG, and candidate predictive process knot configuration. 



sence of a given FTG for each location in S which depends, in part, upon 
predictors/regressors x(s) for a generic plot s. Given this setting, a logis- 
tic regression model is the natural choice for relating the outcome to its 
predictors. It is, however, unrealistic to assume that the model parameter 
values associated with the predictor variables are constant across the study 
area. As noted in Section 1, FTGs are not true categorical variables because 
they are based on somewhat artificial levels of continuous variables, such as 
stocking and proportions of species. For instance, plots labeled aspen/birch 
will exhibit a continuum of aspen and birch tree species composition (e.g., 
from pure aspen to pure birch). Although we expect the assemblage of tree 
species within a given FTG to generally occur together due to shared en- 
vironmental requirements, the observed realization of the FTG across the 
landscape will exhibit some level of heterogeneity in species composition. As 
described above, this is a large domain that contains several broad climate, 
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(a) Observed white/red/jack pine (b) Fitted white/red/jack pine 




(c) Observed spruce/fir (d) Fitted spruce/fir 




(e) Observed oak/pine (f) Fitted oak/pine 



Fig. 3. Interpolated surfaces of the observed and 200 knot spatially-varying coefficients 
model's fitted FTGs. For the observed surfaces the interpolation is over the or 1 binary 
variable of FTG presence/absence and the fitted surfaces are over the probability of FTG 
occurrence. Lighter colors correspond to higher values. The remaining FTG and associated 
fitted values are offered in Figure 4- 



soil origin and topographic gradients. Therefore, within a given FTG, we 
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(a) Observed oak/hickory (b) Fitted oak/hickory 




(e) Observed aspen/birch (f) Fitted aspen/birch 



Fig. 4. Interpolated surfaces of the observed and 200 knot spatially-varying coefficients 
model's fitted FTGs. For the observed surfaces the interpolation is over the or 1 binary 
variable of FTG presence/ absence and the fitted surfaces are over the probability of FTG 
occurrence. Lighter colors correspond to higher values. 



expect regionalized inter-species and intra-species (e.g., genetic adaptation) 
response to predictor variables. 
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We pose a simple-minded analysis to explore variation or, more specifi- 
cally, the nonstationarity of predictor variables' parameter estimates for the 
presence/absence of a single FTG. We tessellate the domain such that each 
tessera holds some minimum number of FIA plots, say, 10, then include a 
tessera random effect on each predictor variable. For this analysis the do- 
main was covered with 68 hexagon tessera indexed by k = 1, ... ,68. For a 
given generic location s the logistic regression model is given by 

(2 1) vis) = ex P( x ( s )^( s )) 

l + exp(x(s)'£(s))' 

where p(s) is the probability that location, or inventory plot, s belongs to 
the given FTG, x(s) is a p x 1 vector that holds an intercept and our set 
of regressors/predictors which include long-term climate and soil variables 
(detailed in Section 2.1), and (3(s) are tessera varying coefficients. We can 
decompose the adaptive f3(s) = (3 + u^, where is the kth tessera's vector 
of random effects (i.e., the tessera within which the given s falls). We assume 

that Ufc *~ M VN(0, ^S>) is a multivariate normal distribution and, for sim- 
plicity, ^ = diag[r 2 ]^ =i . This model was fit for each FTG. We assume that 
the /3's have flat prior distributions and the r 2 's follow an inverse-Gamma 
(IG) with hyperparameters IG(2, 1). Note, with a shape value of 2, the IG 
distribution has infinite variance and is centered on the scale value, which in 
this case is 1. We experimented with a range of scale values but saw negligi- 
ble change in the final parameter estimates. Three MCMC chains of 75,000 
iterations were run for each model. Then posterior inference was based on 
3 x 50,000 = 150,000 post burn-in samples. 

For brevity, Table 1 shows only parameter estimates for white/red/jack 
pine and spruce/fir FTGs. Several predictor variables' parameter estimates 
for these and the other FTGs are significant at the 0.05 level. Also, the 
relative magnitude of the variance terms suggests that we should consider 
models that explicitly accommodate spatial nonstationarity of /3. This idea 
is further supported by the presence of clustering seen in maps of f3 spe- 
cific random effects; see, for example, maps for red/ white/jack pine FTG 
in Figure 5. Similar clustering patterns are seen when mapping the u for 
the other FTGs as well. Given dependence structures capable of modeling 
this potential spatial nonstationarity, we will improve model fit and predict 
FTG with greater accuracy and precision across the study area. 

3. Models. 

3.1. Spatially-varying multinomial logistic regression models. As described 
in Section 1, using the tree species composition and relative stocking, FTGs 
are assigned to inventory plots. Our initial interest lies in modeling the 
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(e) URWJ; SDI (f) MRWJ; SNOW 



Fig. 5. Hexagon specific intercept and predictor variable random effects for the 
red/white/jack pine (RWJ) FTG (values scaled by 100). 

probability of a given plot belonging to a specific FTG. We can obtain these 
probabilities by using J — 1 baseline-category logistic regressions models 
[e.g., Agresti (2002), Chapter 7]. For a generic location s, we consider J — 1 
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Table 1 

Parameter credible intervals for the red/white/ jack pine (RWJ) and spruce/fir (SF) 
FTGs for the hexagon specific random effects models 



Parameter 


2.5% 


50% 


97.5% 


Parameter 


2.5% 


50% 


97.5% 


/3RWJ; 


-2.06 


-1.76 


-1.50 


r RWJ 





0.27 


0.46 


0.75 


/3RWJ; PRECIP 


-0.55 


-0.19 


0.18 


r RWJ 


PRECIP 


0.24 


0.41 


0.68 


/3rWJ; TMAX 


-0.19 


0.12 


0.44 


_2 

T HW.I 


TMAX 


0.32 


0.52 


0.94 


PwN3\ TMIN 


-0.75 


-0.43 


0.03 


t-2 


TMIN 


0.14 


0.34 


0.86 


fewJ; SDI 


-1.02 


-0.76 


-0.47 


r RWJ 


SDI 


0.45 


0.66 


0.98 


/3rWJ; SNOW 


-0.69 


-0.40 


-0.02 


r RWJ 


SNOW 


0.17 


0.26 


0.42 





-2.47 


-2.15 


-1.89 


T 2 
r SF 





0.22 


0.41 


0.63 


/3sF; PRECIP 


-0.88 


-0.53 


-0.23 


_2 
r SF 


PRECIP 


0.19 


0.31 


0.50 


/3sF; TMAX 


-1.27 


-0.85 


-0.45 


_2 
r SF 


TMAX 


0.22 


0.41 


0.74 


/?SF; TMIN 


—1.31 


-0.84 


-0.49 


_2 
r SF 


TMIN 


0.19 


0.33 


0.60 


PSF; SDI 


1.57 


1.79 


2.10 


_2 
r SF 


SDI 


0.16 


0.30 


0.56 


PSF; SNOW 


-0.74 


-0.40 


-0.02 


_2 
r SF 


SNOW 


0.15 


0.26 


0.42 



binary outcome variables such that Yj(s) = 1 if the given plot belongs to the 
jth FTG, j = 1, . . . , J - 1, and if it belongs to the Jth (baseline) FTG. 
The spatially-varying baseline-category logistic regressions yield 

( ou f ^ exp(x j (s) / /3(s)) 

(3.1) 7Ti(s) = 7— ; " = , 

3 l + y: fc t 1 1 exp(x fc (sy/3 fc (s))' 

where 7Tj(s) is the probability that location s belongs to the jth FTG, xj(s) 
was defined previously, and /3(s) denotes spatially-varying coefficients. Then, 
given the J — 1 models, the set of probabilities for s are completed by cal- 
culating 7rj(s) = 1 — J2j=i Kjis)- The baseline category is customarily set to 
the most commonly occurring and spatially pervasive FTG. 

We decompose the coefficients as 0j(s) = f3j +wj(s), where fij's represent 
the nonspatial regression coefficients, as in customary logistic regression, 
while Wj(s) is a multivariate spatial process. More generally, we can write the 
regression component as Xj(s)' /9+z,-(s) / w,-(s) ) where Zj(s) = x,-(s) for a fully 
spatially-varying coefficients candidate model, or could be a subvector of 
Xj(s) representing those predictors whose impact is posited to vary spatially; 
for instance, a spatially varying intercept model will correspond to Zj(s) 
being equal to 1. In principle, we could further hypothesize dependence 
across the j categories. This, however, is not intuitive in our context and 
further complicates the modeling without clear gains. Hence, we assume 
Wj(s)'s are independent across j, but each Wj(s) is a vector of correlated 
processes. 

3.2. Multivariate process models. Multivariate spatial processes, such as 
each Wj(s), are completely characterized by their mean and a cross-covariance 
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(matrix) function (suppressing the suffix on categories), C w (si,S2;0) = 
cov{w(si), w(s2)}. Valid constructions using convolutions of kernels or cor- 
relation functions are possible [Ver Hoef and Barry (1998); Gaspari and 
Cohn (1999)]. An attractive, easily interpretable and flexible approach de- 
velops versions of the linear model of coregionalization (LMC) as in, for 
example, Grzebyk and Wackernagel (1994), Wackernagel (2006), Schmidt 
and Gelfand (2003) or Gelfand et al. (2004). See, also, Reich and Fuentes 
(2007) for a Bayesian nonparametric adaptation. 

In the LMC approach we let w(s) = A(s)v(s) be a spatial linear transfor- 
mation, where v(s) = (v\(s), . . . ,v p (s))' and each Vi(s) is an independent 
spatial process with unit variance and correlation function pi (si , S2 ; Oi) ■ 
Thus, v(s) has a diagonal cross-covariance matrix C v (si,S2) with ith diag- 
onal element as Pi(si,S2',6i) yielding a valid nonstationary cross-covariance 
C w (si,S2) = A(si)C v (si,S2)A(s2)' for w(s). A flexible choice for each 
Pi(sx,S2',9i) is the Matern correlation function, which allows control of spa- 
tial range, <j), and smoothness, v [Stein (1999)] and is given by 

(3.2) p(s, s' ; 0, v) = ^4>y (II s " AWKu(\\s - s'||; 0); 

(f>>0,v>0. 

In general, w(s) is nonstationary even when v(s) is stationary. When A(s) = 
A, w(s) inherits stationarity from v(s): C w (si — S2) = AC v (si — S2)A'. 

1 /2 

Since C w (s!,s 2 ) = A(s 1 )A(s 2 ) / , one can assume that A(s) = C» (s, s) is 
a lower-triangular square-root; the one-to-one correspondence between the 
elements of A(s) and C w (s,s) is well known [e.g., Harville (1997), page 
229]. Stationarity implies A(s) = A and that C w (0) = AA'. Here we could 
either assign a prior, for example, inverse Wishart, to AA' or could further 
parameterize it in terms of eigenvalues and the Givens angles which are 
themselves assigned hyperpriors [Daniels and Kass (1999)]. 

Once a valid cross-covariance function is specified for a multivariate Gaus- 
sian process, the realizations of w(s) over the set of observed locations S is 
given by N(Q, £ w (0)), where £ w (0) is an np x np block matrix whose (i, j)th 
block is the p x p cross-covariance C w (sj, Sj-; 9), i,j = l,...,n. Without fur- 
ther specifications, estimating (3.1) will involve computing the inverse and 
determinant of the dense matrix £ w (0). Such computations invoke solvers 
or factorizations of complexity 0(n 3 p 3 ), not once but iteratively, to produce 
estimates of 6. With large n, this is computationally infeasible. 

4. Multivariate process models for large datasets. Modeling large spa- 
tial datasets observed over irregular locations has received much attention in 
the recent past. One approach employs spectral approximations to the like- 
lihood [e.g., Stein (1999) and references therein; Fuentes (2002)], thereby 
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avoiding large matrix computations [Paciorek (2007); Fuentes (2007)]. This 
works best assuming some form of stationarity and does not easily adapt to 
multivariate coefficient processes. Stein, Chi and Welty (2004) improve upon 
an idea of Vecchia (1988) [also, see Jones and Zhang (1997)] that approxi- 
mates the likelihood with a product of appropriate conditional distributions. 
This yields a joint distribution, but not a process; hence, spatial interpola- 
tion is somewhat cumbersome. While promising, it is yet to be methodi- 
cally explored for non-Gaussian and multivariate likelihoods such as ours. 
Yet another approach, known as "covariance tapering," considers compactly 
supported correlation functions [Furrer, Genton and Nychka (2006); Gneit- 
ing (2002)] that yield sparse correlation matrices. Efficient sparse solvers 
can then be devised for kriging and variance estimation, but these tapered 
functions may limit the scope of the models; also, full likelihood-based infer- 
ence still requires determinant computations that may not be easily avail- 
able. Recently Rue, Martino and Chopin (2009) propose a promising INLA 
(Integrated Nested Laplace Approximation) algorithm as an alternative to 
MCMC that delivers fast posterior approximations. The method's efficiency 
depends upon a Gaussian Markov random field approximation and may not 
be ideal with several hyperparameters (e.g., the matrix A) such as ours. 

4.1. Predictive process models. In principle, we might have adapted any 
of the above approaches to reduce the dimensionality of our model. Seeking 
a more seamless transition to multivariate processes, however, we opt for 
a class of models that emerges from representations of the spatial process 
in a lower-dimensional subspace and easily adapt to multivariate processes. 
These are often referred to as "low-rank" or "reduced-rank" spatial models 
and have been explored in different contexts [Lin et al. (2000); Stein (2007, 
2008); Cressie and Johannesson (2008); Banerjee et al. (2008); Crainiceanu, 
Diggle and Rowlingson (2008)]. Many of these methods are variants of the 
so-called "subset of regressors" methods used in Gaussian process regressions 
for large data sets in machine learning [e.g., Wahba (1990); Rasmussen and 
Williams (2006) and references therein]. The idea here is to consider a set 
of locations, or "knots," say, S* = {s\, . . . , s**}, where the number of knots 
is much smaller than the number of observed locations, and to represent 
the spatial process realizations over S in terms of the realizations over the 
smaller set of knots. Specifically, we define w(s) = C(s; 0)'£^(0) w*, where 
C(s;0)' is the p x n*p block matrix with C w (s,s|) being the jth block, 
E^(0) is the n*p x n*p block matrix whose (i,j)th block is C w (s*,s*) and 
w* = (w(s|)',...,w(s*)')' denotes a realization of the process w(s) over 
S* . Banerjee et al. (2008) call w(s) the predictive process derived from the 
parent process w(s). The appeal behind this formulation is that every spatial 
process (parent) model produces a corresponding predictive process model. 
In our subsequent models, we will assume a stationary cross-covariance for 
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the parent process, that is, C w (si — S2) = AC v (si — S2)A' and, in particular, 
C w (s,s) = C w (0) = AA'; the predictive process will still be nonstationary. 

The dimension reduction is seen immediately. In fitting the predictive 
process counterparts of (3.1), the np x 1 vector w = (w(si)', . . . , w(s„)') / is 
replaced by Z(0)w*, where Z(0) = C(0)'£*(0) _1 with C(6)' being an np x 
n*p block matrix with C w (sj, s*) as its (i, j)th block. Therefore, Z(0) is np x 
n*p and we work with an n*p-dimensional joint distribution only. Evidently, 
the parent model in (3.1) is different from its predictive process counterpart. 
Hence, though we introduce the same set of parameters in both models, they 
will not be identical. This approach leads to a different parameterization 
from that of low-rank smoothing splines [e.g., Lin et al. (2000); Kamman 
and Wand (2003); Crainiceanu, Diggle and Rowlingson (2008)], but yields 
the same joint marginal distribution for process realizations. 

Stein (2007, 2008) undertakes an exploration of a subset of regressors 
methods, pointing out its pitfalls for data with fine-scale variation — a con- 
sequence, perhaps, of the fact that information for the covariances tends 
to concentrate at shorter lags. Indeed, the predictive process tends to over- 
smooth and also underestimates the spatial variance component. For multi- 
variate processes, this follows from the following inequality: 

var{w(s)} - var{w(s)} = C w (s, s) - C(s, 6>)'S*(6>)~ 1 C(s, 0) 

= var{w(s)|w*} y 0, 

where var{-} denotes the variance-covariance matrix and y indicates non- 
negative definiteness. Equality holds only when the knots coincide with the 
spatial locations, whence the predictive process realizations coincide with 
those from the parent process. The univariate analogue of the above argu- 
ment shows that v&r(wk(s)) < var(u;fc(s)) for each element k = 1, . . . ,p. 
A remedy for the bias is to add a spatially- varying noise to form Wg(s) = 

w(s)+e(s), where e(s) ~ d N(0, C w (s, s) -C(s, 0)'£*(0)~ 1 C(s, 0)). This "cor- 
rection" yields varjw^s)} = var{w(s)} as desired and -E{wg(s)jw*} = w(s), 
so w E -(s) inherits the attractive approximation properties of w(s). Also, no 
new parameters are introduced, ensuring identifiability, and the computa- 
tional benefits are retained since e = (e(si)', . . . , e(s n )')' has a block-diagonal 
variance-covariance matrix. In typical geostatistical models for continuous 
outcomes, e(s) is referred to as the "nugget" and is used to capture mea- 
surement error or micro-scale variability. Notice that such "nugget" effects 
do not arise naturally in generalized linear models, such as ours, and should 
not be interpreted as such. In fact, we do not introduce any "new" variance 
parameter for the nugget. Rather, e(s) has a very special variance structure 
that adjusts for the bias in the spatial variance, diminishes the excessive 
smoothness of w(s) and, in our experience, considerably improves model-fit 
and robustness to fine-scale variation. 



16 



A. O. FINLEY, S. BANERJEE AND R. E. MCROBERTS 



We conclude this section with some brief remarks on knot selection. With 
a fairly even distribution of data locations, one possibility is to select knots 
on a uniform grid overlaid on the domain. A design-based approach that min- 
imizes a spatially averaged predictive variance criterion [e.g., Zhu and Stein 
(2006); Diggle and Lophaven (2006)] can be used. With irregular locations, 
however, we may encounter substantial areas of sparse observations where 
placing would amount to "wastage," possibly leading to inflated variance es- 
timates and slower convergence. More practical space-covering designs [e.g., 
Royle and Nychka (1998)] can yield a representative collection of knots that 
better cover the domain. We can also apply other popular clustering al- 
gorithms such as k-means or partitioning around medoids algorithms [e.g., 
Kaufman and Rousseeuw (1990)]. Implementations of these algorithms are 
available in R packages such as fields and cluster and have been used in 
spline-based low-rank kriging models [Ruppert, Wand and Carroll (2003)]. 

4.2. Model fitting details. Let w e ~ = (wg(si)', . . .,w £ -(s n) 1 )' denote the re- 
alizations from the noise-added predictive process over the set of observed 
locations in S. This follows an np x 1 multivariate normal distribution ~ 
JV(O,Ew,(0)), Sw e ~(0) = £e(0) + C(0)'£*(0) _1 C(0), and £ e ~(0) is a block- 
diagonal matrix whose ith diagonal is given by C w (sj, Sj) — C(s,; 0)'C*(0) _1 C(s, 
Estimation could proceed with a multinomial likelihood that estimates the 
7Tj(sj)'s in (3.1) from the posterior 

J-l ( n J \ 

p({^.,^,w,.}iy)oc nfrtwfliM*^)} >< nn^( s ^ (si) > 

j=i U=ij=i J 

where Y is the vector of observed outcomes. Alternatively, we find it more 
convenient to compute the posterior distributions for J — 1 logistic regression 
models [see Agresti (2002), Chapter 7; Begg and Gray (1984)] and use the 
posterior samples to obtain the posterior distribution of the classifiers in 
(3.1). Now each posterior distribution is given by 

P(%|Y) ocp(/3,)p(0 J )p(w £ -|0 J ) 

n 
i=l 

where % = {fy, Wg^Oj}, logit{^(sj)} = x J -(s i ) / j3 3 -+Zj(s i ) / w^(s i ). Posterior 
estimation will proceed employing MCMC samples that will yield samples 
This will then involve computing the expression in (4.1) featuring 

Sw^flj)" 1 for i = 1, . . . , J - 1 in each iteration of the MCMC. We benefit 
from the Sherman- Woodbury-Morrison (SWM) formula [see, e.g., Hender- 
son and Searle (1981)] to compute Ew^fljO^ 1 as 

moj)- 1 - ^r^o'p'Pj) +c(0 i )£ £ ~(0 i r i c(0,-) / ]- 1 

xCie^iGj)- 1 . 
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Here is block-diagonal, {C w (si, s { ; Oj) - C(s,; J ) / S*(0)- 1 C(s i ; Oj)} 

being the ith block, while the other inversion in the second term is n*p x n*p. 
Likelihood computations also require the determinant of given by 

|Ee(0j)l|i:*(0j) +C{O j )Y l e{O j )- l C{O j )'\/\Y,*{O j )\. Computational gains ac- 
crue because is the product of the determinant of its block-diagonal 
submatrices, while the remaining two determinants are of order n*p. 

Spatial interpolation at a location so can be achieved by composition 
sampling: for each drawn from the posterior, we first draw w~ (so) ~ 

p(w£.(so)|f^), which is multivariate normal with mean /x^_ (so',Qj) = 
Cw gj (so; Oj)' x 2>w g . {Oj)~ l ^e 3 and variance (s ;Oj) = C^ s . (s , s ; Oj) - 
Cwe. ( s o; Oj)' x (flj) -1 Cw f . (so; Oj), where Cy,-. (s ; Oj)' is the 1 x n block 
matrix with ith block given by C(so, 0j)'S*(0j)~ 1 C(sj; <?A The posterior 
distribution of the prediction probability 7Tj(so) is then directly obtained 
from (3.1) using the posterior samples for the J — 1 models 



7Tj( s o) 



(l) exp(x J -(s )'/3r + z^p/w^sp)) 



1 + Eti exp(x fc (s )'/3^ + z fc (s )'w^(s )) 
5. Data analysis. 

5.1. Model validation and benchmark comparisons. The models detailed 
in Sections 3 and 4 were fit to the Michigan FIA data described in Sec- 
tion 2. Because our primary interest is in prediction of FTG, we compare 
the candidate models' ability to predict FTG for a set of 200 holdout (or 
validation) plots that were selected at random from the 5180 FIA plots. 
Prediction for a new (or holdout) plot is based on the prediction distribu- 
tion, 7r(so) = {7Ti(so), • • • ,7rj(so)}. We consider several different scoring rules 
to evaluate the predictive performance of the candidate models. A scoring 
rule provides a summary measure for evaluating a probabilistic prediction 
given the predictive distribution and the observed outcome. In our setting 
the scoring rule function is S(ir,i), where i is the index of the observed 
FTG. Given 200 holdout plots, {s «,}*L°i, 

we can calculate summary statis- 
tics of the scores, for example, the mean score is S = Y^q=i ^iifo ' wnere 
7v q = 7v(sQ q ). In fact, we can obtain the entire posterior distribution of the 
scoring rule [i.e., S(ir®,i), I = 1, . . . , L] and report the posterior summaries. 
Gneiting and Raftery (2007) offer four scoring rules for prediction of cate- 
gorical variables: 

Zero-one: 5(ir,i) = ( !' * = " " " 

L 0, it otherwise, 
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J 

Quadratic: S(tt, i) = 2-Ki — ^ ttj — 1, 

3=1 
7T, 



Spherical: S(ir,i) 



(£/=W) 1/2 ' 

Logarithmic: S(7T,i) = log7Tj. 

Following definitions in Gneiting and Raftery (2007), all the noted scoring 
rules are strictly proper but for the zero-one, which is only proper. The 
zero-one scoring rule uses only a portion of available information, ignoring 
variability in the predictive distribution and returning either a zero or one. 
Similarly, the logarithmic scoring rule considers only one of the probabilities 
in the predictive distribution. 

In addition to these four scoring rules, we examine classification confusion 
matrices, parameter estimates, and the models' ability to produce spatially 
consistent fitted and predicted distributions of FTGs. 

We also compare our spatially-varying multinomial logistic regression 
models to common benchmark methods. Currently, fc-nearest neighbor (k- 
NN) methods are among the most frequently applied for forest attribute 
mapping using NFI data [McRoberts, Nelson and Wendt (2002); Tomppo 
and Halme (2004)]. Using this method, the prediction probability that a new 
location will belong to the j'th FTG is 

7r i( s o) = ^H^(s i )' 



1^0 



where we temporarily redefine y(s) as the index (or label) belonging to 
one of the J distinct FTG's and 5 at b is the Dirac function (S a ,b = 1 if o = 
b, and otherwise). The term ^ k records the proportion of the jih 
FTG in the set of k nearest neighbors, where nearness is defined using a 
distance metric, d(v)> between the predictor variables. Here we consider 
two benchmark models: the first defines nearness as the Euclidean distance 
between geographic coordinates, d(s, s'), and the second calculates Euclidean 
distance between predictor vectors, d(x(s), x(s')). The parameter k is chosen 
by running the algorithm for a range of values, for example, k € 1, . . . ,40, 
then choosing the value that optimizes the specified objective function. 



5.2. Implementation specifics. As detailed in Section 3, the candidate 
multinomial logistic regressions include nonspatial model, spatially-varying 
intercept and spatially- varying coefficients models. Maple/beech/birch is set 
as the regressions' baseline category due to its abundance (35.19% of ob- 
served plots) and pervasiveness [covering the entire domain, see Figure 2(b)]. 
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For estimating predictive process models, we used 154, 200, and 254 knots 
over the domain [e.g., Figure 2(c) shows 200 knots]. We experimented with 
the clustering algorithms noted at the end of Section 3 and the infill de- 
signs suggested by Diggle and Lophaven (2006). Neither provided substan- 
tial changes in model parameter estimates or improvements in prediction. 
We note, however, that this lack of improvement is likely data specific and 
these knot selection approaches should be explored for each new analysis. 

To complete the Bayesian specification, we assign priors to the models' 
parameters. As customary, we use a flat prior on all (3 parameters. For the 
univariate spatial process in the spatially-varying intercept model, priors 
must be specified for, a 2 , and the Matern correlation function's range, <f>, 
and smoothness, u, parameters [i.e., = {a 2 , <p, u}]. For each FTG model, 
we assume that a 2 follows an IG(2, 10) and (ft follows a Uniform prior with 
support [7(5.22 x 10 -6 , 1), which is between 1 and 575,000 m when v = 0.5 
(i.e., about 3/4 of the maximum intersite distance of 766.4 km). Again, with 
a shape value of 2, the IG distribution has infinite variance and is centered 
on the scale value, which in this case is 10. The smoothness parameter was 
set to follow [7(0, 2). Then, as described in Section 3, the nxl w§ follows 
a iWW(O,E # -(0)). 

For the spatially-varying coefficients model, we assumed AA' follows an 
IW(p + l,I p ) prior. We experimented with different diagonal TWs scale 
matrices to assess this prior's influence on posterior distributions. The spa- 
tial range and smoothness parameter for each (3 specific process again follow 
[7(5.22 x 10" 6 , 1) and [7(0, 2) , respectively. The np x 1 w e ~ follows MVN(0, 
with appropriate dimension adjustments to the mean vector and dispersion 
matrix. Details about the Metropolis sampling algorithm are provided in 
the Finley, Banerjee and McRoberts (2009) supplemental article. 

For each model, three MCMC chains were run for 75,000 iterations. The 
sampler was coded in C++ and leveraged Intel's Math Kernel Library 
threaded BLAS and LAPACK routines for matrix computations. The code 
was run on a Linux workstation with two Intel Xeon quad core processors. 
The spatially-varying coefficients model was the most computationally chal- 
lenging, with each chain of the 254 knot model taking ~5 hours to complete. 
The CODA package in R (www.r-project.org) was used to diagnose conver- 
gence by monitoring mixing using Gelman-Rubin diagnostics and autocor- 
relations [see, e.g., Gelman et al. (2004), Section 11.6]. Acceptable conver- 
gence was diagnosed within 25,000 iterations and, therefore, 150,000 samples 
(3 x 50,000) were retained for posterior analysis. 

6. Results and discussion. Table 2 offers parameter estimates for the 
red/white/jack pine multinomial logistic regression candidate models [again, 
due to space limitations, parameter estimates for the remaining five FTGs 
are available in the Finley, Banerjee and McRoberts (2009) supplemental 



Table 2 

Parameter credible intervals, 2.5% 50% 97.5% percentiles, for the red/white/jack pine FTG nonspatial and predictive process candidate 
model. Note, that for brevity, only the diagonal elements of C are provided for the space-varying coefficients model. All <j> parameter 

values are scaled by 10. 5 

Spatial predictive process 

Spatially- varying intercept Spatially- varying coefficients 
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(e) /3rwJ; sdi (f) Prwj: snow 

Fig. 6. Interpolated surfaces of $ for the red/ white/jack pine (RWJ) FTG. Lighter colors 
correspond to higher values. 

article]. For the red/white/jack pine FTG, as with the other FTGs, several 
of the predictors are significant at the 0.05 level. For the nonspatial and 
spatially-varying intercept models the parameters associated with SDI and 
SNOW were significant at the 0.05 level and the signs on these parameter 
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estimates are consistent with the predictor surfaces in Figure l(d, e) and 
observed red/white/jack plots in Figure 3(a). Specifically, the concentration 
of red/ white/jack pine in the northcentral lower peninsula of Michigan cor- 
respond to areas of low snow depth and dry soils. The central most area of 
high red /white/jack pine in Figure 3(a) is predominantly composed of pure 
jack pine and red pine plantations. Recently established jack pine planta- 
tions came about as part of a state incentive program to reestablish habitat 
for the Kirtland's warbler {Dendroica kirtlandii) , which is an endangered 
neotropical migratory bird. Similar connections between observed probabil- 
ity of FTG and predictor variables can be made for the other FTGs offered 
in the Finley, Banerjee and McRoberts (2009) supplemental article. Inter- 
preting the significance of predictor variables' parameters for the spatially- 
varying coefficient models must be done in a spatial context. Rather than 
look at the aspatial (3 slope parameters, which for these models are simply 
the mean over the domain, interpretation should be based on (3, as depicted 
in Figure 6. However, it is important to interpret these parameter estimates 
with caution, recalling that presence/absence of an FTG is relative to the 
baseline category, not all other forested plots. If our focus was to better 
understand the relationship between FTGs and the environmental predictor 
variables, then we should use (2.1), and associated submodels, with the bi- 
nary response of 1 if the FTG of interest is present and otherwise, over all 
forested plots. Because our focus is on achieving high prediction accuracy, 
we do not dwell on interpreting the j3 or j3 and instead consider measures 
of predictive accuracy, model fit and model adequacy. 

We now turn to the results of the 200 plot holdout set analysis. Table 3 
displays scores for the benchmark and candidate models. The mean scores, 
over the 200 holdout plots, are reported for the benchmark models. For the 
multinomial models, fit using MCMC, the median and upper /lower 95% 
credible intervals for the scores' mean posterior distribution are reported. 
For the scoring rules, higher scores indicate superior predictive performance. 
That is, for the zero-one and spherical scoring rules, scores closer to 1 indi- 
cate greater predictive performance, whereas for the quadratic and logarith- 
mic rules, scores closer to suggest improved performance. The fc-NN algo- 
rithm which measures nearness in geographic space, benchmark 1, provided 
the worst prediction on all scoring rules. Benchmark 2, which calculates 
nearness in predictor variable space, performed better than the nonspatial 
multinomial model (except for the logarithmic rule) and comparable to the 
space-varying intercept multinomial model for the quadratic and spherical 
rules. Noting the potential limitation of the zero-one and logarithmic scor- 
ing rules, this suggests that the easily implemented fc-NN algorithm is a 
viable option for similar analyses, if second order properties of the resulting 
map products are not needed. Allowing the regression coefficients to vary 
spatially improved predictive performance over the single spatial random 
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effect model, as reflected in the scores for the spatially-varying coefficients 
models. Increasing knot intensity beyond 200 did not improve predictive 
performance for the spatially-varying intercept and coefficients models. 

The large difference in predictive ability between the nonspatial and spa- 
tial multinomial models reveals a requirement of the forest inventory sam- 
pling design and a limitation of the proposed models. Specifically, the ob- 
served locations must be dense enough to estimate the range of the spatial 
process associated with the intercept and, in the case of the spatially- varying 
regression model, predictor variables. If the data array is sufficiently sparse, 
or predictions are made beyond the range of the spatial influence, the predic- 
tion only learns from the predictor variables and cannot draw information 
from the proximity of the observed locations. For the analysis presented 
here, the FIA plot array is dense and predictions are made well within the 
support of the spatial range of the predictor variables. 

Table 4 offers the confusion matrices for benchmark 2, nonspatial and 
200 knot spatially-varying intercept and coefficients models. Here, predic- 
tion is based on the maximum FTG in the predictive distribution (equiv- 
alent to the zero-one scoring rule). Not surprisingly, these tables suggest 
substantial confusion within the conifer and deciduous FTG. For exam- 
ple, due to similar species composition, there is high misclassification be- 
tween the maple/beech/birch and aspen/birch FTGs. Also, the relatively 
rare oak/pine FTG which exhibits a split of conifer and deciduous species 
was not correctly classified by any of the models. However, moving from the 
benchmark and nonspatial to the 200-knot spatially-varying intercept and 
then to the spatially-varying coefficients model does substantially improve 
prediction. 

The validation analysis supports the use of the spatially-varying coeffi- 
cients model. The use of this model is further corroborated by the presence 
of spatial dependence across the coefficients, which is summarized by the 
estimates of C w (0), and coefficient specific <j) and v offered in Table 2. The 
estimated effective spatial ranges (i.e., the distance at which the spatial cor- 
relation drops to 0.05) and associated 95% credible intervals in kilometers 
for the red/white/jack pine FTG are 251 (173, 310), 124 (96, 173), 107 
(85, 149), 127 (103, 203), 186 (154, 265), 135 (116, 155), for the intercept, 
PRECIP, TMAX, TMIN, SDI and SNOW, respectively. These long spatial 
ranges help support our initial simplifying assumption to combine the data 
layers into a common pixel resolution. The associated random spatial ef- 
fect surfaces of f3 are again illustrated in Figure 6. Similar strong coefficient 
spatial dependence is seen in the other FTG models. The responsiveness of 
these spatially- varying coefficients to local trends in FTG presence/absence 
provides small errors between observed and fitted values as seen when com- 
paring the left and right columns of Figures 3 and 4. 
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In practice, we can make a pixel- level prediction of the FTG predictive 
distribution wherever the predictor variable set is observed. Then, in a subse- 
quent step using a Geographic Information System (GIS), users can remove 
or mask those pixels that were deemed nonforest from a separate predic- 
tion/classification exercise. 

7. Summary. Analysis of large spatial domains is becoming more com- 
mon, due, in part, to increased access to threaded mathematical libraries 
that can leverage the power of multi-processor computers and improvements 
in dimension reduction methods such as low-rank spatial processes. Only 
through a combination of these tools was the analysis presented here com- 
putationally feasible. With expanding domains of interest comes an increas- 
ing propensity for nonstationarity in the underlying spatial process. From 
a statistical validity standpoint, it is important that we define models that 
are equipped to deal with nonstationarity. From a utilitarian perspective, 
and as seen in our results, addressing nonstationarity can improve model fit 
and, more importantly, prediction. 

Our central interest was to improve FTG prediction given a relatively 
dense array of forest inventory plots and a spatially complete set of environ- 
mental predictor variables. The results suggest that the multinomial logistic 
regression with spatially-varying coefficients is well suited to this objective. 
Further, following a Bayesian approach to model fitting provided estimates 
of the spatial parameters and access to posterior predictive probabilities at 
each new location. It would be difficult to estimate the spatial parameters a 
priori, and even if reasonable estimates could be made, the plug-in approach 
used in traditional methods can provide falsely precise estimates of predicted 
FTG probabilities. This could, in turn, negatively impact end-users sensi- 
tivity analyses. 

Finally, while the current approach apparently meets our stated objec- 
tives, it does not account for underlying structured dependence across cat- 
egories. Investigating and accounting for such dependence can be crucial in 
understanding the relationship between the probability of FTG occurrence 
and environmental variables, for example, how the spatial patterns of FTG's 
joint probabilities will shift with changing temperature or water regimes re- 
sulting from climate change. One approach is to consider spatial versions 
of the multinomial probit models [see, e.g., McCulloch, Poison and Rossi 
(2000)]. This and related issues are of great scientific interest and construct- 
ing models to elucidate these relationships will guide our future research 
effort. 
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SUPPLEMENTARY MATERIAL 

Description of MCMC sampling algorithm and supplementary results. 

(DOI: 10.1214/09- AOAS250SUPP; .pdf). Here we provide a description of 
the Metropolis scheme used to fit the candidate models. Parameter estimates 
for the FTGs are also presented. 
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